We compare three methods described in the article:
- HPSS masking method
- Wavelet approach based on Nongpiur's work
- Our proposed IS³ method
We first present examples from the generated test set, followed by examples from real-life recordings.
Examples from the generated test set¶
The first few examples are from the test set generated following the same pipeline as the training and validation dataset used in the paper.
Example 1.¶
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import display, Audio
from rendering.is3.dataloader_numpy import ImpulsiveStationarySeparation
sr = 44100
dataset = ImpulsiveStationarySeparation()
We first sample one example from the test set composed of:
- a stationary background track
- an impulses track
- a mixture track
The mixture track is used as input of the three methods. The three methods are then applied to the mixture track to separate the stationary and impulsive components.
bkg, impulse, mix, gain, norm_gain = dataset.read_scene(
scene_index=1368, subset="test", dataset="random")
Background
Impulses
Mix
HPSS with a margin of 1¶
We first apply the HPSS decomposition masking method with a margin parameter equal to 1.
# HPSS
from rendering.is3.baselines import hpss
hpss_module = hpss.HarmonicPercussiveDecomposition(
nfft=2048,
window_size=2048,
overlap=0.75,
margin=1.
)
y_p, y_h, _, _ = hpss_module.forward(mix)
print("HPSS/Impulses")
display(Audio(y_p, rate=sr))
print("HPSS/Stationary Background")
display(Audio(y_h, rate=sr))
HPSS/Impulses
HPSS/Stationary Background
Important leakage of both stationary and impulsive components can be heard in the separated tracks. The resonance of the impulsive sound is poorly separated from the background resulting in a dry sound for the impulsive track.
HPSS with a margin of 2¶
We apply the same method with a greater margin parameter equal to 2 to enhance the separation between impulsive sounds and stationary sounds.
# HPSS
hpss_module_2 = hpss.HarmonicPercussiveDecomposition(
nfft=2048,
window_size=2048,
overlap=0.75,
margin=2.
)
y_p_2, y_h_2, _, _ = hpss_module_2.forward(mix)
print("HPSS/Impulses")
display(Audio(y_p_2, rate=sr))
print("HPSS/Stationary Background")
display(Audio(y_h_2, rate=sr))
HPSS/Impulses
HPSS/Stationary Background
The leakage from the background is reduced on the impulsive track, but the stationary track still contains some impulsive sounds.
Wavelet filtering¶
We now apply the wavelet filtering method based on Nongpiur's work, with our added modifications to predict also an impulsive track.
from rendering.is3.baselines import wavelet_script
wavelet_module = wavelet_script.WaveletBaseline(
wavelet="db",
level=13,
sr=sr,
ks=2.,
ks_impulse=6.,
kc=1.,
kernel_size=1025,
)
wavelet_bkg, wavelet_impulse = wavelet_module.forward(mix)
print("Wavelet/Impulses")
display(Audio(wavelet_impulse, rate=sr))
print("Wavelet/Stationary Background")
display(Audio(wavelet_bkg, rate=sr))
Wavelet/Impulses
Wavelet/Stationary Background
We obtain poor results on the impulsive track with some audio artefacts. Moreover, on the stationary track the original method proposed in the Nongpiur’s article only attenuates the impulsive sounds, which are still present.
Note: It's important to remember that the choice of parameters in this wavelet approach is particularly dependent on the type of impulses and the type of ambient sound (speech in the original article). A search for parameters more suited to the context of our article has been carried out, but the wide variety of sound types and acoustic scenes we study means that this approach performs very unevenly from one example to another.
Proposed system IS³¶
Finally, we apply our proposed IS³ method to the mixture track.
from rendering.is3.model_wrapper import ModelWrapper
import torch
model = ModelWrapper(
conf_name="014",
job_id=None,
)
_ = model.eval()
y_i, y_s = model.forward(torch.tensor(mix).reshape(1, -1))
print("IS3/Impulses")
display(Audio(y_i[0].detach().numpy(), rate=sr))
print("IS3/Stationary Background")
display(Audio(y_s[0].detach().numpy(), rate=sr))
IS3/Impulses
IS3/Stationary Background
The separation process is a lot improved here with no leakage from one part on the other. There is simply a slight attenuation of the resonance of impulsive sounds, which sound a little drier than in the target track.
fig, axs = plt.subplots(5, 1, figsize=(15, 12), sharex=True, sharey=True)
fig.suptitle('Comparison of Impulse Separation Methods')
# Plot target impulse
axs[0].plot(impulse)
axs[0].set_title('Target Impulse')
axs[0].set_ylabel('Amplitude')
# Plot HPSS (margin=1) impulse
axs[1].plot(y_p)
axs[1].set_title('HPSS (margin=1) Impulse')
axs[1].set_ylabel('Amplitude')
# Plot HPSS (margin=2) impulse
axs[2].plot(y_p_2)
axs[2].set_title('HPSS (margin=2) Impulse')
axs[2].set_ylabel('Amplitude')
# Plot Wavelet impulse
axs[3].plot(wavelet_impulse)
axs[3].set_title('Wavelet Impulse')
axs[3].set_ylabel('Amplitude')
# Plot IS3 impulse
axs[4].plot(y_i[0].detach().numpy())
axs[4].set_title('IS³ Impulse')
axs[4].set_ylabel('Amplitude')
axs[4].set_xlabel('Sample')
plt.tight_layout()
plt.show()
fig, axs = plt.subplots(5, 1, figsize=(15, 12), sharex=True, sharey=True)
fig.suptitle('Comparison of Stationary/Background Separation Methods')
# Plot target background
axs[0].plot(bkg)
axs[0].set_title('Target Background')
axs[0].set_ylabel('Amplitude')
# Plot HPSS (margin=1) background
axs[1].plot(y_h)
axs[1].set_title('HPSS (margin=1) Background')
axs[1].set_ylabel('Amplitude')
# Plot HPSS (margin=2) background
axs[2].plot(y_h_2)
axs[2].set_title('HPSS (margin=2) Background')
axs[2].set_ylabel('Amplitude')
# Plot Wavelet background
axs[3].plot(wavelet_bkg)
axs[3].set_title('Wavelet Background')
axs[3].set_ylabel('Amplitude')
# Plot IS3 background
axs[4].plot(y_s[0].detach().numpy())
axs[4].set_title('IS³ Background')
axs[4].set_ylabel('Amplitude')
axs[4].set_xlabel('Sample')
plt.tight_layout()
plt.show()